A method for comparing data splitting approaches for developing hydrological ANN models
نویسندگان
چکیده
Data splitting is an important step in the artificial neural network (ANN) development process whereby data are divided into training, test and validation subsets to ensure good generalization ability of the model. Considering that only one split of data is typically used when developing ANN models, data splitting has a significant impact on the performance of the final model by potentially introducing bias and variance into the model development process. Therefore, it is important to find a robust data splitting method which results in an ANN model that represents the underlying data generation process of a given dataset. In practice, ANN models developed using different data splitting methods are often assessed based on validation results. In previous research, however, it has been found that validation results alone are not adequate for assessing the performance of ANN models. Data splitting methods have the potential to bias the validation results by allocating extreme observations into the training set and therefore, the test and validation sets contain fewer patterns compared to the training set. Consequently, the generalization ability of the model may be compromised and the trained model cannot be adequately validated. This paper introduces a method to compare different data splitting methods for developing ANN models fairly. The methodology is applied to compare a number of well-known data splitting techniques in the context of some hydrological ANN modeling problems.
منابع مشابه
Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
Background: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multi...
متن کاملA novel approach to parameter uncertainty analysis of hydrological models using neural networks
In this study, a methodology has been developed to emulate a time consuming Monte Carlo (MC) simulation by using an Artificial Neural Network (ANN) for the assessment of model parametric uncertainty. First, MC simulation of a given process model is run. Then an ANN is trained to approximate the functional relationships between the input variables of the process model and the synthetic uncertain...
متن کاملپیشبینی قیمتهای نقدی گازطبیعی به کمک مدلهای غیرخطی ناپارامتریک
Developing models for accurate natural gas spot price forecasting is critical because these forecasts are useful in determining a range of regulatory decisions covering both supply and demand of natural gas or for market participants. A price forecasting modeler needs to use trial and error to build mathematical models (such as ANN) for different input combinations. This is very time consuming ...
متن کاملبرآورد تبخیر– تعرق پتانسیل بر اساس مدلهای تصادفی سریزمانی (مطالعه موردی ایستگاه تبریز)
Evapotranspiration is important components of hydrological cycle, which is important in irrigation systems planning and evaluation of climate change impacts on water planning. In this study, evapotranspiration time series using Penman Monteith was studied in Tabriz synoptic station by the linear stochastic models such as ARIMA and SARIMA. The data had been used since 1986 to 2010. After calcula...
متن کاملComparing ANN and CART to Model Multiple Land Use Changes: A Case Study of Sari and Ghaem-Shahr Cities in Iran
Most of the land use change modelers have used to model binary land use change rather than multiple land use changes. As a first objective of this study, we compared two well-known LUC models, called classification and regression tree (CART) and artificial neural network (ANN) from two groups of data mining tools, global parametric and local non-parametric models, to model multiple LUCs. The ca...
متن کامل